Visual Basic 6, ActiveX and Unicode
One of the older applications I support uses ActiveX controls embedded inside a web page. These controls request data from a web server to update the information on the page without requesting the whole page again, much in the same way that AJAX is now commonly used.
This has worked fine for the Latin code pages (ISO8859-1, ISO8859-15), and for the double byte code page (cp950) that have been tested. However it did not work when I tried the UTF-8 Unicode code page.
The reason for this is fairly simple:
VB stores strings internally using Unicode, but assumes that the outside world is ANSI.
This means that Visual Basic will convert from ANSI to Unicode (UTF-16) when storing a string, and convert it back again when it is retrieved.
The ActiveX controls use the Microsoft Inet control to request data via HTTP. This uses the GetChunck() method in the StateChanged event in order to read the data in to a string. This was the first cause of my problems as Visual Basic will automatically convert the data in the string to ANSI, which loses the Unicode characters.
The Inet control GetChunck() method takes two parameters; size and data type. The size parameter tells it how much data to read, and the data type parameter tells it what data type to read it in to. The data was being read in to a string (icString), but to avoid the conversion I had to change this to a byte array (icByteArray) to avoid the automatic conversion process.
So far so good. But now I had a UTF-8 byte array that I needed to convert in to a string without losing data in the conversion process. This was a bit of a sticking point as Visual Basics string conversion function StrConv() can’t cope with UTF-8 and none of the API calls I found to convert the string worked. You can assign a string equal to a byte array and no automatic conversion happens, but as strings are stored internally as UTF-16 this does not work.
I was nearly at the stage where I either needed to write my own conversion process, or re-develop the controls in another language with better UTF-8 support.
Then I found this solution:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | Public Function ConvertUtf8BytesToString(ByRef data() As Byte) As String Dim objStream As ADODB.Stream Dim strTmp As String ' init stream Set objStream = New ADODB.Stream objStream.Charset = "utf-8" objStream.Mode = adModeReadWrite objStream.Type = adTypeBinary objStream.Open ' write bytes into stream objStream.Write data objStream.Flush ' rewind stream and read text objStream.Position = 0 objStream.Type = adTypeText strTmp = objStream.ReadText ' close up and return objStream.Close ConvertUtf8BytesToString = strTmp End Function |
This does not use any APIs but requires the Microsoft ActiveX Data Objects 2.5 Library or later.
Using this solution I was able to assign the original internal string variable to the result of this function and the rest of the code in the controls worked.
1 | strWSConnectReturnData = ConvertUtf8BytesToString(bytWSConnectReturnData) |
The ActiveX controls also read data values from the webpage and POST them back to the webserver. The values are read via the DOM. These also need to be converted in the opposite direction, before they can be URL encoded.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | Public Function ConvertStringToUtf8Bytes(ByRef strText As String) As Byte() Dim objStream As ADODB.Stream Dim data() As Byte ' init stream Set objStream = New ADODB.Stream objStream.Charset = "utf-8" objStream.Mode = adModeReadWrite objStream.Type = adTypeText objStream.Open ' write bytes into stream objStream.WriteText strText objStream.Flush ' rewind stream and read text objStream.Position = 0 objStream.Type = adTypeBinary objStream.Read 3 ' skip first 3 bytes as this is the utf-8 marker data = objStream.Read() ' close up and return objStream.Close ConvertStringToUtf8Bytes = data End Function |
This returns a byte array, and I pass it directly in to a function that URL encodes the byte array, returning a sting.
1 | String = URLEncodeUTF8ByteArray( ConvertStringToUtf8Bytes( DomValue) ) |
Many thanks to Tim Hastings for his solution, as this has saved me a lot of pain!
Interesting to see how you got around this one.
I still think you need to make a case to loose this legacy code though 🙂
If you want to make it easy to support UniCode in Visual Basic then take a look at the UniToolbox control suite which replaces all the common VB controls with UniCode aware versions:
http://www.iconico.com/UniToolbox
That is just fantastic. Had been fighting this damn issue all day, and was also about to give up on it, when i found this.
I was trying to read an ascii file and then write it out to another file (RTF format) and it kept adding 2 bytes in the start of the document, so Word or Wordpad did not like it anymore.
So from reading you recipe i got the idea to just advance the position 2 bytes like this, and it works wonders.
objStream.Position = 2
GetFile = objStream.ReadText
Here, full unicode support in both design mode and during runtime for VB6.
Complete source code.
OptionButton
Checkbox
Label
CommandButton
File I/O
Clipboard I/O
Other routines for putting Unicode into the caption of any VB control with a hWnd,
including forms.
Just download from here and you’re all set:
http://motionlabresources.org/Unicode%20&%20RTF%20for%20VB6.zip
Elroy, thanks for that. Been a long time since I’ve been actively doing any VB6 development so I haven’t verified it. Hopefully it will help someone else out.
Hey Paul, it’s just source code, so whoever gets it can check it out themselves. Occasionally I’ve needed some special characters and always done a hack but I finally decided to just bite the bullet and develop some nice Unicode controls for VB6. Interestingly, the problem has always been the PropertyBag (and the Properties Window). Captions and the RichTextBox (as well as VB6 strings) have always done unicode. I just bit the bullet and figured out how to get Unicode in the PropertyBag (it was actually quite easy, just a byte array in a variant, and it gets stored in the .FRX file) and used the RichTextBox for allowing editing of the caption property during design time. I still maintain many many thousands of lines of code written in VB6, and I’m just getting more and more in a mindset of sharing these days. You take care.